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and  assessing  these  explanations  for  plausibility.  Previous  research  has 
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hypothesis  retrieval  process  which  is  based  on  a Model  for  hypothesis 

retrieval  developed  by  Gettys,  Fisher,  and  Mehle  (1978).  A computer  simulates 

I 

j the  him  an  hypothesis  retrieval  process  by  searching  an  enriched  associative 
memory  which  contains  the  associations  of  a lumber  of  individuals  in  the  form  a 

: 

lists  of  hypotheses  for  each  datum.  Uhen  the  data  of  a decision  problem  become 

; 

known,  the  appropriate  lists  are  searched  by  the  computer.  Hypotheses  that  are 
common  to  most  or  all  of  the  lists  are  suggested  to  the  user,  who  assesses  them 

I 

i 

for  plausibility.  An  experiment  was  performed  to  determine  the  utility  of  the  ^ 

I 

| aid  for  both  expert  and  non-expert  users.  The  aid  produced  a substantial  gain 
in  performance  for  both  groups  of  users,  suggesting  that  further  development  of 
the  aid  would  be  worthwhile  in  decision  situations  which  are  repeated  often 
enough  to  warrant  the  creation  of  an  enhanced  artificial  memory.  Also  discussed 
are  several  techniques  for  implementing  the  aid,  and  determining  the  maximum 


gain  in  performance  that  the  aid  can  produce. 

\ 


secumiTv  ct  AMimicATiON  or  this  mAoepri«*  £>.*» 


A memory  retrieval  aid  to  enhance  hypothesis  generation  performance 


The  structuring  of  a decision  problem  is  a vital  precursor  to  the  actual 
decision,  since  model  specifications  are  used  in  all  further  analyses  of  the 
decision  problem.  If  the  decision  maker  fails  to  consider  relevant  hypotheses 
or  acts  in  the  decision  model  then  the  entire  decision  process  can  go  awry 
because  the  model  employed  is  incomplete  or  faulty. 

Recently  we  have  become  convinced  that  the  process  of  hypothesis  generation, 
one  of  the  vital  constituents  of  problem  structuring,  is  quite  inefficient  in 
non-routine  tasks  involving  many  possible  hypotheses.  For  example,  Gettys, 
Fisher  and  flehle  (note  1)  found  that  hypothesis  generation  performance  is  quite 
impoverished;  only  about  50Z  of  their  subjects  were  able  to  generate  two  of  the 
three  relevant  hypotheses  in  a hypothesis  generation  task.  Curiously,  uhile 
subjects  were  unable  to  retrieve  complete  hypothesis  sets  from  memory,  their 
assessment  of  the  completeness  of  these  sets  was  quite  optimistic  (Gettys, 
Mehle,  and  Fisher,  submitted).  Evidently  the  subjects  believed  that  the 
hypothesis  sets  are  more  complete  than  they  actually  are  because  hypotheses 
that  have  not  been  generated  are  relatively  unavailable  in  memory  (Tversky  and 
Kahneman,  1974).  The  inability  to  generate  all  relevant  hypothesies  coupled 
with  the  belief  that  more  of  the  relevant  hypotheses  have  been  generated  than 
is  actually  the  case  makes  the  subjects  particularly  vulnerable.  Unaware  of 
their  deficiencies,  they  are,  in  effect,  "fat  and  happy”. 


Since  subjects  fail  to  retrieve  enough  relevant  hypotheses  from  memory,  and  are 
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often  unaware  of  their  failure,  it  would  be  profitable  to  develop  an  aid  for 
hypothesis  generation.  The  hypothesis  generation  aid  ue  propose  is  identical  in 
logical  structure  to  the  hypotheses  retrieval  model  developed  by  Gettys,  Fisher 
and  ffpMe  (1978)  to  describe  the  retrieval  of  hypotheses  from  human  memory. 
However,  it  differs  in  that  its  associative  memory  is  enhanced  because  it 
conbines,  or  pools,  the  associations  of  a number  of  individuals  by  using  a 
computer.  Thus  the  aid  is  able  to  retrieve  additional  relevant  hypotheses  that 
were  not  retrieved  by  the  user  because  the  aid,  in  effect,  searches  the  memory 
of  many  individuals,  counteracting  the  inefficiencies  in  the  memory  of  the 
individual  user.  The  supplemental  hypotheses  provided  by  the  aid  should  provide 
a larger,  improved  iv  ithesis  set. 

This  aid  is  best  suited  tor  repetitive  decision  situations,  or  other  situations 
where  it  is  deemed  worthwhile  to  go  to  the  effort  of  constucting  an  artificial 
memory  in  advance.  Some  decision  situations  occur  frequently  such  as  automotive 
and  electronic  trouble  shooting,  or  medical  diagnosis;  other  situations  may 
never  have  occurred,  but  are  objects  of  much  advance  planning,  and  thought. 
These  latter  situations,  such  as  planning  for  a possible  melt-down  of  a nuclear 
reactor,  have  possible  repercussions  that  are  so  profound  that  advance  planning 
is  conducted.  In  either  of  these  types  of  situations  the  effort  of  constructing 
an  artificial  memory  to  aid  hypothesis  generation  may  be  warranted.  Uhile  the 
gain  from  each  decision  in  a repetitive  decision  situation  may  be  relatively 
small,  there  are  many  of  these  decisions  that  could  be  aided.  In  the  case  of 
the  latter  type  of  decision  whose  repercussions  nay  be  profound,  the  effort  to 
construct  an  aid  to  hypothesis  generation  may  be  worthwhile,  even  if  it  is 


never  used. 
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The  sid  is  implemented  by  constructing  an  enhanced  artificial  memory  in 
advance.  Uhile  there  are  other  techniques  for  constructing  the  artificial 
memory  than  the  pooling  process  Mentioned  here,  their  discussion  is  deferred  to 
a later  point  in  the  paper.  The  pooling  process  involves  asking  a nuMber  of 
individuals  to  search  their  memories  for  hypotheses  that  are  associated  with  a 
datum.  From  these  varied  associations,  a list  that  is  rich  in  hypotheses  is 
constructed  by  pooling  the  hypotheses  of  the  contributors  to  the  list.  This 
process  is  repeated  for  each  datum  that  is  anticipated  in  the  environment.  The 
result  is  an  enhanced  associative  memory  where  the  lists  for  each  datum  consist 
of  the  associations  between  that  datun  and  a number  of  hypotheses. 

Once  the  artificial  Memory  15  stored  in  the  computer  it  is  ready  for  use.  After 
the  arrival  of  a datum,  or  a collection  of  data,  a search  is  made  of  the  list 
of  possible  hypotheses  for  each  new  datum,  and  hypotheses  that  appear  on  Many 
of  the  lists  are  noted  and  added  to  a list  of  computer-generated  potential 
hypotheses.  Then  this  list  of  hypotheses  is  compared  to  the  list  of  hypotheses 
that  the  oser  has  generated.  Hypotheses  that  the  computer  retrieved  fron  its 
memory  that  were  not  retrieved  by  the  user  are  displayed  to  the  user.  Finally, 
these  additional  hypotheses  lire  assessed  for  plausibility  by  the  user  and  added 
to  the  current  hypothesis  set  if  the  user  finds  them  plausible. 

This  aid  is  similar  in  logical  structure  to  early  medical  diagnosis  aids.  These 
aids  were  unsuccessful  because  they  employed  deterministic  inference  in  a 
probabilistic  task.  However,  such  an  aid  is  viable  uhon  the  task  is  aiding 
memory  retrieval,  as  no  attempt  is  made  by  the  aid  to  engage  in  probabilistic 
inference.  Since  hypotheses  are  retrieved  for  subsequent  evaluation  by  the 
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user,  the  difficulties  that  were  encountered  in  medical  diagnosis  are  avoided. 

The  primary  empirical  question  to  be  addressed  in  this  study  is  the  assessment 
of  the  actual  gain  in  hypothesis  generation  performance  that  the  aid  provides 
when  hypotheses  suggested  by  the  aid  are  assessed  by  the  user.  One  variable 

that  should  have  an  effect  on  the  improvement  produced  by  the  aid  is  the 

expertise  of  the  user.  Non-expert  users  should  show  the  biggest  gain  in 
performance,  and  expert  users  should  show  more  modest  gain".  This  manipulation 

also  addresses  the  question  of  whether  non-expert  subjects  can  serve  as 

"surrogates"  for  scarce  expert  subjects. 

A second  variable  of  interest  is  the  number  of  data  in  the  hypothesis 
generation  tasks.  The  aid  does  not  contain  a unique  list  of  hypotheses  for 
every  possible  combination  of  data  that  could  occur.  If,  for  example,  there 
were  100  data  that  were  possible,  then  the  number  of  lists  that  would  be 
neccessary  ..would  be  two  to  the  hundredth  power.  Instead,  only  100  lists  of 
hypotheses  would  be  created.  Hypotheses  that  are  appropiate  for  multiple  data 
are  found  by  searching  for  hypotheses  that  are  common  the  lists  for  the  data  of 
that  problem.  The  advantage  of  this  latter  procedure  is  a tremendous  reduction 
in  the  effort  to  construct  the  artificial  memory  of  the  aid,  but  the  rule  for 
finding  hypotheses  suggested  by  multiple  data  may  be  inefficient.  Consequently, 
we  decided  to  study  the  aid  on  the  single,  and  in  a multiple-data  case. 

Accordingly,  the  design  of  the  aid  study  incorporates  a between-subjects 
variable  of  expertise  of  the  user,  and  uithin-subjects  variables  of  number  of 
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data,  and  whether  or  not  a particular  hypothesis  generation  problem  is  aided. 
An  additional  non-aided  control  condition  is  incorporated  in  the  design  to 
assess  irrelevant  differences  between  the  non-expert  and  the  expert  groups. 
This  condition  employs  a hypothesis  generation  task  in  a domain  where  experts 
and  nonexperts  have  comparable  amounts  of  background  and  knowledge  in  order  to 
assess  differences  in  performance  of  the  two  groups  due  to  nuisance  variables 
unrelated  to  expertise,  such  as  intelligence  and  motivation. 

Method 


Method  and  procedure  for  the  aiding  experiment. 

Hypothesis  generation  tasks.  The  hypothesis  generation  tasks  chosen  for  the 
aid  experiment  are  the  "Majors"  task  and  the  "Animals"  task  (Gettys,  Fisher, 
and  Kehle,  1978).  In  the-  Majors  task  the  subject  is  given  several  courses  that 
an  University  of  Oklahoma  ( OU ) student  has  token,  and  is  asked  to  generate  a 
list  of  plausible  majors  for  this  student.  The  Animals  task  is  similar  except 
that  the  subject  is  asked  to  generate  a list  of  plausible  animals  from  several 
animal  characteristics. 

The  Majors  task  was  chosen  because  we  have  access  to  the  veridical  posterior 
probabilities  of  various  majors  given  the  classes  that  a OU  student  has  taken. 
The  presence  of  the  veridical  probability  makes  the  evaluation  of  the  aid 
possible  as  they  provide  the  information  necessary  to  create  objective  indices 
of  performance.  The  veridical  probabilities  were  not  used  in  the  construction, 
or  the  operation  of  the  aid.  The  veridical  probabilities  are  population  values; 
146,858  enrollment  records  at  the  University  of  Oklahoma  covering  a four-year 
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period  were  tabulated  to  obtain  thu-se  values. 

F'roblehs  in  the  Majors  task  had  either  1 or  3 data,  and  were  aided  on  50%  of 
the  trials.  The  Animals  task  where  possible  animal  hypotheses  were  generated 
f ron  animal  characteristics  served  as  a control  task,  and  was  not  aided. 

A n example  of  a Majors  problem  is  now  described.  This  is  a three  data  problem; 
the  three  data  are:  l)  Chenistry  1314-General  Chemistry,  2)  Chenistry 
3G53-0rganic  Chenistry,  and  3)  Mathematics  1513-College  Algebra.  The  list  of 
hypotheses  generated  by  a randomly  chosen  expert  subject  included  the  following 
majors:  Chenistry,  Engineering,  Pharmacy , arid  Physics.  There  were  in  fact  it 
majors  which  had  non-negligible  probabilities  for  this  problem.  These  majors 
and  their  veridical  percentages  are:  Business  (4.1%),  Chemistry  (6.27.) , 
Education  (2.1%),  Engineering  (4.1%),  Laboratory  Technology  (9.37.),  Liberal 
Studies  (2.1%),  Medical  technology  (4.1%),  Microbiology  (6.2%),  Pharmacy 
(21.6%),  Psychology  (9.2%),  University  College  (11.3%)  and  Zoology  (1C. 3%). 
University  College  is  an  "undeclared"  major  for  beginning  students.  The 
remaining  majors  all  had  percentages  of  less  than  2%.  The  sum  of  the 
percentages  of  majors  with  percentages  greater  than  2%  is  90.6%.  If  a subject 
achieved  such  a sun,  it  would  have  been  optimal  performance  in  this  task  as  the 
subjects  were  instructed  to  respond  with  all  hypotheses  greater  than  2%.  As 
can  be  calculated  from  the  percentages  of  the  hypotheses  that  the  subject 
generated,  the  subject's  performance  was  31.9%,  and  the  subject  failed  to 
generate  several  important  hypotheses  such  as  Laboratory  Technology, 
Microbiology,  Psychology,  University  College,  and  Zoology.  This  performance  is 
typical  of  the  average  subject  for  this  problem.  The  number  31.9%  has  a direct 
theoretical  interpretation;  it  is  the  probability  that  trie  subject's  list  of 
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Majors,  or  hypothesis  set,  contains  tue  correct  hypothesis.  Thus,  for  this 
problem,  this  subject  would  have  failed  to  consider  the  correct  hypothesis  with 
a probability  of  68.  M. 

An  example  of  an  animal  problem  was  to  name  aninals  that  have  antlers.  The 
responses  of  the  sane  subject  to  this  problem  were:  deer,  noose,  antelope,  and 
reindeer.  The  reader  is  Invited  to  generate  additional  hypotheses  for  this 

problen. 

Apt.yaratus . The  experiment  was  controlled  by  a Compucolor  computer  which  had  a 
color  graphics  capability,  and  was  programmed  in  extended  Basic. 

Subjects.  The  subjects  in  this  experiment  were  drawn  from  two  populations. 
Non-expert  subjects  were  University  of  Oklahoma  students  who  were  required  to 
have  at  least  60  hours  of  course  work  at  the  University  and  typing  skills. 
These  students  were  recruited  from  classes  and  neuspapei  advertisements,  and 
were  paid  *5.00  for  their  participation.  There  were  16  subjects  in  this  group. 

The  expert  subjects  were  University  of  Oklahoma  Curriculum  Advisors.  Various 
Colleges  and  Departments  of  the  University  maintain  advising  offices  and  employ 
individuals  with  a job  title  of  "curriculum  advisor"  who  are  expert  on 
University,  College,  and  Departmental  requirements,  and  the  course  offerings  of 
the  University.  This  group  of  experts  are  professional  student  advisors  who 
work,  with  student  schedules  on  a daily  basis.  There  are  about  30  such  advisors, 
and  we  recruited  16  for  this  experiment.  These  subjects  were  paid  a $10.00 


honorarium"  for  their 


participation  in  addition  to  their  usual  salary. 


8 


iQSl-Cyctions  to  subjects . The  instructions  to  subjects  were  elaborate.  First, 
written  instructions  explaining  the  experiment  and  the  aid  were  presented  on 
the  computer.  It  was  explained  that  the  possibilities  suggested  by  the  aid  were 
to  be  carefully  assessed;  that  the  subjects  should  use  their  best  judgment  m 
deciding  whether  or  not  to  include  the  aid's  suggestions  on  their  list  of 
hypotheses.  A particlarly  pertinent  section  of  these  instructions  is 
reproduced  below: 


Ue  are  investigating  a computer  aid  for  memory  in  this  study.  Ue  have 
found  that  people  sometimes  fail  to  remember  relevant  information  in 
certain  situations.  The  computer  aid  acts  a a prompt  for  memory.  Ue 
are  interested  in  in  learning  how  useful  the  aid  is  in  helping  people 
search  their  memories. 

As  we  are  interested  in  the  extent  to  which  the  aid  helps  you  search 
your  memory,  it  is  vitally  important  that  you  understand  everything 
about  the  experiment.  For  this  reason,  we  want  you  to  ask  questions 
whenever  you  need  clarification.  Ue  will  be  happy  to  explain  any 
aspect  of  the  experiment  to  you. 

One  of  the  first  things  that  a doctor  does  before  making  a diagnosis 
is  to  make  a mental  list  of  the  possible  diseases  that  the  patient 
might  have  based  on  the  patient's  symptoms.  If  this  list,  does  not 
include  the  disease  that  the  patient  has,  the  doctor's  diagnosis  is 
bound  to  be  wrong.  So  coming  up  with  a complete  list  of  possibilities 
is  very  important  and  we  are  studing  an  aid  that  should  help  people 
create  a more  complete  list. 

Instead  of  investigating  medical  diagnosis  which  requires  special 
expertise,  we  have  chosen  similar  problems  which  have  the  same 
characteristics.  Some  of  these  problems  involve  generating  a list  of 
possible  majors  for  an  unknown  OU  student  on  the  basis  of  courses 
that  this  student  has  taken.  For  example,  if  you  knew  that  the 
unknown  student  had  taken  7 hours  of  Zoology,  you  would  probably 
include  Biological  Science  majors  on  you  list  such  as  Zoology  and 
Botany.  The  student  could  also  be  a Psychology  major  who  took  these 
courses  as  part  of  a Pre-bed  program,  or  even  an  Art  major  who  is 
fascinated  by  Zoology.  Art,  of  course,  is  not  nearly  as  likely,  but 
it  is  possible.  Many  other  possible  majors  exist.  Can  you  think  of 


any?  How  likely  are  they? 


To  cut  the  task  of  generating  this  list  down  to  Manageable  sire,  you 
need  not  add  possible,  but  highly  unlikely  Majors  (such  as  Art)  to 
the  list  you  will  generate.  If  the  chances  of  a particular  Major  are 

less  than  22  you  should  not  add  it  to  your  list,  but  all  Majors  which 

are  nore  likely  than  22  should  be  included  on  your  list. 

One  way  of  Making  this  clearer  is  to  imagine  that  all  the 

non-transfer  students  who  had  taken  these  Zoology  courses  for  the 
last  several  years  were  assent led  in  a large  auditor uim.  Students  are 
seated  by  Major,  under  large  signs  giving  their  Majors.  Some  Majors 
will  have  Many  students,  others  will  have  only  a few,  or  none.  Your 
task  will  be  to  list  all  the  Majors  which  include  More  than  22  of  the 
total  nunber  of  students  in  this  room.  If  this  isn't  perfectly  clear, 
now  would  be  a good  tiMe  to  discuss  this  with  the  experimenter. 


Other  probleMS  will  involve  Making  lists  of  animals  from  their 
characteristics.  Use  the  2%  rule  here  also.  If  the  animal  having  the 
specified  characteristics  is  quite  rare,  you  need  not  add  it  to  your 

list. 


following  the  written  instructions  the  subjects  worked  three  practice  problems 
using  the  sane  procedure  as  in  the  Mam  experiment.  There  was  one  of  each  type 
of  problem  used  in  the  main  experiment;  an  unaided  “Majors"  problem,  an  aided 
"Majors"  probleM,  and  an  "Animal  problem  were  included  in  the  practice  set  so 
that  the  subjects  would  have  experience  with  all  of  the  types  of  problems  to  be 
encountered  in  the  main  session. 

Design  of  the  study . The  design  is  a 2 by  2 by  2 mixed  factorial  where 
expertise  is  a be  tween-groups  variable,  and  number  of  data  and  aiding  are 
within-groups  variables.  There  were  four  one-datum  problems,  and  four 
three-data  problems.  Each  .object  was  aided  on  502  of  the  problems 
counterbalanced  across  number  of  data  so  that  each  problem  uas  aided  equally 


often  for  each  group.  The  two  "Animal"  problems  were  included  with  the  eight 
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“Majors"  problems,  so  each  subject  worked  a series  of  ten  problems  in  the  main 
part  of  the  experiment.  The  order  of  the  ten  problems  was  randomized  for  each 
subject. 

frocedure.  Following  the  instructions  the  subjects  worked  ten  problems  at 
their  own  pace.  The  experimental  session  typically  lasted  between  one  and  one 
'and  a half  hours.  The  data  of  the  problem  was  displayed  on  the  video  screen  of 
the  computer.  The  subject's  answers  were  typed  into  the  computer  keyboard.  For 
the  "Majors"  problems  a spelling  check  was  made  by  the  computer.  Uhen  the 
subject  entered  a major  it  was  compared  to  a list  of  the  63  possible  majors.  If 
an  exact  letter— for-letter  match  was  found,  the  major  was  added  to  the 
subject's  list  of  plausible  majors.  If  this  match  failed,  then  the  computer 
e..ecuted  a routine  where  it  attempted  to  identify  the  entry.  If  a major  closely 
approximating  the  subject's  entry  was  found,  the  computer  asked  for 
confirmation  that  this  major  was  in  fact  the  one  that  the  subject  intended.  The 
subject  continued  to  enter  majors  until  the  subject  believed  that  all  the 
majors  which  included  more  than  211  of  the  students  who  had  taken  the  specified 
courses  had  been  identified.  Then  the  subject  entered  "liONE"  into  the  computer. 
If  the  problem  was  unaided  for  that  subject,  the  program  began  the  next 
problem.  If  that  problem  was  aided  then  the  aiding  display  was  generated. 
Subjects  were  unaware  that  a particular  problem  was  aided  until  this  point  to 
control  the  possibility  that  they  might  rely  on  the  aid  if  they  knew  that  a 
particular  problem  was  aided  m advance. 

If  a problem  was  aided,  the  list  of  "Majors"  that  the  subject  generated,  the 
data  of  that  problem,  and  the  "Majors"  suggested  by  the  aid  were  displayed.  Any 


majors  that  the  subject  generated  were  removed  from  the  aid  before  it  was 
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displayed  to  the  subject.  The  subject  had  been  told  during  the  instructions  and 
the  practice  problems  that  this  aiding  list  was  generated  by  the  computer  on 
the  basis  of  other  people's  responses  and  that  it  was  to  be  searched  for  majors 
that  were  greater  than  71  to  be  added  to  the  list.  The  majors  suggested  by  the 
aid  were  numbered.  Subjects  indicated  which  majors  were  to  be  added  to  their 
list  by  typing  the  number  of  the  major.  They  could  adopt  several  majors  by 
entering  the  numbers  associated  with  the  majors,  separated  by  plus  signs.  Any 
Majors  so  adopted  were  transferred  from  the  aiding  list  to  the  list  of 
responses  adopted  by  the  subject  on  the  display.  This  process  was  repeated 
until  the  subject  entered  the  number  indicating  that  all  the  desired  transfers 
to  their  list  had  been  made.  When  all  10  problems  had  been  worked,  the  session 
ended  and  the  subjects  completed  a short  questionaire  concerning  the  aid. 

All  hypotheses  that  the  subject  generated  were  recorded,  as  were  the  hypotheses 
suggested  by  the  aid  and  adopted  by  the  subject.  The  basic  index  of  performance 
for  aided  and  unaided  responses  is  the  posterior  probability  associated  with 
each  of  these  majors,  as  was  indicated  previously  in  the  example  problem. 


Creation  of  the  aid. 

The  aid  is  created  t«y  generating  a list  of  possible  hypotheses  for  each  datum, 
and  storing  these  lists  in  a computer  for  future  access.  In  principle,  there 
are  many  ways  that  these  lists  can  be  generated.  In  situations  which  are 
relatively  well  understood,  such  as  automotive  and  electronic  trouble  shooting, 
authoritative  sources  of  information  can  be  consulted  to  generate  these  lists. 
Alternatively,  historical  records  can  be  consulted  to  provide  this  information. 


I 'i  other  s 1 tuiii luiiS , where  'inch  author  1 l". Live  sources  of  1 ..fot  vatior.  ore 
i.imvai  table,  rt  is  possible  to  gen  . .t>  th>  ce  list  ••  tv  pooling  hypotheses 
generated  by  I' nowledger.Me  lndividu.d -. . Ue  ..note  the  1-r.tler  technique  for  t1.,. 
study  because  ...  be]  rve  it  is  likely  t b easi  to  . . eeut  in 

applied  setting.  It.--  pooling  process  i<=  essential  because  ,...y  dividual  may 
have  lapses  an  hypothesis  leitornt ion  and  goner  .to  an  incomplete  1.  t o.‘ 
hypotheses,  if,  however,  everal  . . ...  a ...  : I ■ . v .c  ■ , othe  es 

which  one  individual  fails  to  ro;.r ic  ■ ...  • often  ou  1 eo  by  mother 

andividucil  because  of  di f fere..:.  . in  tlv.i.  c.  y sciences  and  because  lapses  in 
retrieval  from  nenory  by  one  individual  nay  ■ ■ b succe  f 

other  individuals. 

Oru:<:  the  l.-.-.s  o:  hypotheses  'or  each  da  tun  . . e Croat:-!,  id  stored  in  the 
conputer,  the  aid  is  ready  for  use.  its  .v.f  •.ryive  If  :s<?  lv  .■■■  acc  ed  and 
Pc.rcr.ed  for  hypotheses  that  are  con  non  to  sevet  . of  the  1 » - is.  Hypotheses 
that  occur  none  frequently  than  . t. . • hold  ,-r  presented  to  th  dec  i n 

(layer  for  assessment,  and  are  adopted  if  they  a-c-  sufficiently  plausible. 

tc  i the  conf : il  .ijf.  r ! . i • j ti>  . ! !'.<?  expertise  and  number  of 

contributors  to  the  lists  are  important  va  ible-  which  affect  the  qu.Iity 

of  the  lists.  First,  it  is  desit  1 • that  the  lists  t-e  ge  -reved  by  experts  as 
the  hypothesis  set  of  an  expert  should  bo  larger  than  t . of  a non-expert  due 
to  the  erj.ei  t .ueatc.  v now  ledg.-  ..d  expo r i ones . Second,  as  even  experts  hare 
■ ..  memory,  several  e should  contribute  to  each  list.  An 


increase  in  the  number  of  contributors  Xiould  enhance  the  list  to  the  exto.it 


th:it  their  experience  and  knowledge  differ.  I r choice  of  the  number  of  experts 
to  consult  is  primarily  governed  by  their  expe- tise,  and  the  importance  of  the 
problem.  An  increase  n the  number  of  contributors  to  t.e  list  should  partially 
compensate  for  a lucr  of  a hi i. -q ! i decree  of  expertise  on  the  p,.rt  of  the 
contributors.  On  the  other  hano.  if  tr,  • contributors  are  experts  of  the  highest 
quality,  then  a smaller  .umber  of  contributors  should  be  sufficient. 

Contributors  use*!  in  this  e.  t n;r . ■ . In  the  present  tudy,  non-experts  were 
used  to  gener ate  the  lists  since  there  ueie  not  enough  e .per ts  to  do  this  task 
and  also  parti  ioaie  in  the  ?>:per uient.  The  nor.  experts  used  were  students  in 
Experine'  u a 1 Psychology  ■ t1-.  . /er  -A/  of  n iahcna.  This  course  is  typically 

laic  by  uppe. c 1 . ■ cnen  due  to  ; t.  •• -erequisites.  fig'. teen  students  generated  a 
list  in  response  to  each  datum,  ‘ their  li-sts’uve  pooled.  A hypothesis  was 
included  on  the  list  .f  .t  u...s  given  by  any  of  the  ‘3  contributors  to  the  list. 

fly-  .feii  p Metric.  Dnce  th  aid  has  been  gener  .ted  it  is  important  to 
assess  its  a-1  guacy.  - f ,.ve  d-v  lo„  >d  a technique  for  studying  the  adequacy  of 
the  lists  used  ir  the  hypoth-  r gone,-  rtion  aid.  This  technique  has  a variety 
of  potential  applicatio  ..  Fi  • . it  is  possible  to  ch.iracter  i:e  the  Man  nun 
•gain  that  could  te  eipected  using  the  aid  if  the  hypothesis  assessment  of  tr.e 
user  were  perfect.  Seco.i,  the  performance  of  an  unaided  user  can  be  roughly 
estimated  using  the  memory  ivtn  r..  e.odel  of  Gettys,  iishnr,  and  fier.Ii?  UV73  . 
Third,  it  is  possible  to  decide  how  mat. . ontnbutors  to  each  list  should  be 
used.  These  ideas  nay  have  coasidi-  able  pr  ctic  >1  importance  if  this  aid  proves 
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to  be  successful  in  certain  situations.  Unfortunately  these  techniques  require 
veridical  posterior  probabilities  which  nay  be  difficult,  or  impossible  to 
obtain  in  some  situations. 

This  Metric  is  naned  Delta  P,  and  it  is  based  on  the  following  ideas.  Each 
contributor  to  the  lists  is  given  a datum  and  is  asked  to  generate  an  list  of 
possible  hypotheses.  Each  of  these  lists  can  be  characterized  by  the 
probability,  P,  that  the  correct  hypothesis  is  contained  in  the  list.  P is 
calculated  by  sunning  the  posterior  probabilities  for  each  hypothesis  on  the 
list.  This  sum  the  probability  that  the  list  will  contain  the  correct 
hypothesis.  The  value  of  P is  less  than  1.0  to  the  exten ’ that  the  list  is  not 
exhaustive,  or  lacks  relevant  hypotheses.  Various  contributors  to  the  list  do 
not  generate  exactly  ♦ ? ••  n-  hypotheses.  Fooled  lists  should  contain  n ore 

plausible  hypotheses  than  ary  individual's  list  and  will  have  a greater  value 
of  P.  This  technique  can  re  dily  be  generalized  to  H individuals.  The 
difference  in  P between  an  mdividu  I list  and  the  pooled  list  resulting  from  N 
individuals  is  terned  Delta  P,  which  is  the  gain  in  P resulting  from  the 
pooling  process.  These  elementary  considerations  yield  several  interesting 
results. 

Estimating  unaided  perform _■>  . First,  if  P is  calculated  for  each  of  the 

contributors,  the  average  value  of  p is  an  estimate  of  the  performance  of  an 
unaided  user  for  a particular  datum.  The  memory  tagging  model  of  Gettys,  Fisher 
and  Hehle  can  be  used  to  male  a rough  estimate  of  unaided  performance  for 
multi-data  problems  tv  Itonte  Carlo  techniques. 

Estim iting  thy  maximum  £<?*>•>  itae  gain  from  the  aid.  The  delta  P value 


resulting  from  pooling  the  hypothesis  lists  or  all  contributors  is  an  estimate 
of  the  maximum  possible  gain  for  users  of  the  sane  level  of  expertise  as  the 
contributors.  This  gain  nay  not  be  realized  in  practice  if  the  aided  user  does 
not  exploit  the  full  potential  of  the  aid,  but  if  the  aid  shows  a snail  value 
of  Delta  P in  a given  situation  then  the  aid  will  be  of  little,  or  no  use  in 
that  situation. 


E§ii«ati>!j  the  number  of  contributors  to  the  aid.  by  varying  the  number  of 
contributors,  tf,  fro*  one  to  its  maximum  value,  and  calculating  F'  for  each 
possible  value  of  N,  a neg  .tively-accelerated  curve  in  F'  is  traced  out.  This 
analysis  can  be  performed  by  Monte  Carlo  techniques  uhere  the  lists  of  the 
various  contributors  are  randomly  chosen,  ov  by  an  exhaustive  analysis  where 
all  possible  combinations  of  contributors  are  assessed. 

Getting  the  threshold  values  of  the  aid.  There  are  two  threshold  values  that 
impact  on  the  performance  of  the  aid.  By  adjusting  these  values  the  hypothesis 
set  that  the  aid  produces  can  be  varied  at  will.  First,  the  criterion  for 
including  the  hypothesis  on  the  list  can  be  varied.  In  the  present  study  this 
criterion  was  set  so  that  if  any  of  the  13  contributors  tc-  the  aiding  lists 
suggested  a major  for  the  aid  it  was  included  in  the  lists  that  the  computer 
searched.  Such  a criterion  admits  many  majors  to  the  list  that  are  quite 
unlikely,  but  maximizes  the  number  of  relevant  hypotheses  included  on  the  list. 
Ue  chose  this  criterion  because  it  is  possible  to  calculate  what  aided 
performance  would  have  been  if  a more  stringent  criterion  had  been  employed, 
and  so  are  able  to  examine  the  performance  of  the  aid  with  various  criteria. 
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The  second  criterion  that  mist  be  deter  wined  is  the  rule  to  be  used  by  the 
computer  when  searching  the  list  for  hypotheses  that  are  camion  to  several  of 
the  lists.  For  the  one -datum  problems  the  choice  is  forced,  as  only  one  list  is 
searched.  For  the  three  data  problems  we  picked  a criterion  that  the  major  must 
appear  on  at  least  two  of  the  three  lists  before  it  is  suggested  to  the 
subject.  Ue  cnose  a value  of  two  because  previous  research  (Gettys,  Fisher,  and 
Mehle,  1978)  suggests  that  subjects  retrieve  a hypothesis  from  memory  when  it 
is  tagged  for  two  out  of  three  data. 

6y  adjusting  these  two  criteria  it  is  possible  to  increase  the  number  of 
relevant  hypotheses  that  the  aid  retreives  but  at  a cost  of  increasing  the 
number  of  unlikely  hypotheses  that  are  retrieved.  Each  time  the  aid  is 
implemented  these  decisions  will  have  to  be  made.  In  effect,  the  mesh  size  of 
the  net  must  be  set  to  determine  the  minimum  size  of  fish  that  will  be  caught. 


Using  these  criteria  the  aid  suggested  32  majors  for  the  example  problem 
discussed  previously.  (This  happens  tu  be  the  maximum  number  of  majors 
suggested  for  any  prcMem.J  Of  the  12  majors  tl.  t were  more  likely  than  ZZ,  9 
were  i.. dried  on  the  aid  list.  The  aid  did  not  sucessfully  "Btrieve  Laboratory 
Technology  (9.31*,  Liberal  Studies  (2. IX),  and  University  College  (11.3),  but 
it  did  retnevp  hypotheses  whose  sum  was  A7.92.  Had  the  criterion  for  the 
inclusion  of  a major  on  the  aid  lists  been  at  le«r-t  two  out  of  the  13 
contributors  t:>  ♦.  • aid,  then  the-  aid's  performance  would  have  been  63.?;,  and 
the  aid  would  have  only  suggested  9 hypotheses  less  than  2X  rather  than  23  as 


no s actually  the  case. 
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Results  ; rind  discussion 

Performance  on  the*  control  Cci<-K. 

As  the  experiment  employed  Luo  distinct  populations  of  subjects,  we  included  a 
control  condition  to  detect  possible  differences  between  our  non-expert  and 
expert  subjects  on  a topic  that  was  irrelevant  to  the  expert's  specialty.  The 
task  chosen  was  the  "Animal"  task  which  we  felt  tapped  items  of  common  know lege 
which  both  groups  should  have.  Thus  differences  in  performance  should  be  due  to 
hypothesis  generation  ability. 

The  number  of  animal  responses  that  were  consistent  with  the  data  were 
tabulated  for  both  groups.  The  mean  number  of  appropriate  responses  for  the 
non-experts  was  b.16,  while  the  experts  achieved  a mean  of  only  3.1?  correct 
responses  (F=6.90;  df*1.30:  p'.OS). 

It  might  be  tempting  to  explain  these  results  using  some  of  the  common 
prejudices  connected  with  experts,  but  we  fc-elieve  that  another  explanation  is 
more  likely.  Ue  noticed  that  the  expert's  attitude  toward  these  "Animal" 
problems  was  sometimes  one  of  indifference.  The  experts  uere  recruited  with  the 
idea  that  their  expertise  would  contribute  to  the  evaluation  of  the  aid.  Ue  did 
not  mention  in  our  recruitment  literature  that  another  group  of  non-expert 
subjects  would  be  a part  of  the  experiment,  or  the  purpose  of  the  animal 
problems,  nor  did  we  volunteer  this  information  unless  asked.  For  these 
reasons,  some  of  the  experts  probably  regarded  these  problems  as  irrelevant 
trivia.  The  non-expert  subjects  were  mostly  advanced  psychology  majors  who 
perhaps  were  hardened  to  the  practices  of  experimental  psychologists,  Uhile 
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this  conparision  is  perhaps  flawed  for  the  above  reasons,  the  results  suggest 
that  the  expert  subjects  are  i go  N?tti:i  than  the  non-experts  in  general 
hypothesis  generation  ability,  a result  which  will  aid  the  interpretation  of 
other  results. 

Unripled  per  [or  nonce . 

For  problens  where  the  expertise  of  the  experts  was  relevant,  one  night  expect 
that  the  experts  would  show  superior  performance  to  non-experts,  and  in  fact 
this  was  the  case.  Ue  sunned  the  probabilities  of  all  the  "Majors"  hypotheses 
generated  by  the  subjects  without  the  aid  that  were  greater  than  2X  for  both 
groups  using  the  technique  illustrated  previously.  The  mean  performance  of  the 
non-experts,  expressed  as  a percentage,  was  47.77.,  while  the  mean  for  the 
experts  was  50. 6 X . The  dn ference  in  performance  is  2.9Z  and  this  difference  is 
stotisically  reliable  (F-4.5;  df-1,28;  pf.05).  This  difference,  while  in  the 

expected  direction,  is  surprisingly  low.  Ue  expected  a larger  difference. 

This  snail  difference  between  expo,  . and  non-experts  raises  some  interesting 
questions  about  the  role  expertise  plays  in  the  hypothesis  generation  process. 
Our  earlier  reports  of  deficiences  in  hvpa thesis  generation  using  non-expert 
subjects  nave  been  questioned  due  to  our  subject's  lack  of  e 'pertise.  Our 
results  for  the  pxpert  and  non-expert  subjects  suggest  that  subject-matter 
expertise  is  not  a potent  variable  in  hypothesis  generation,  and  that 
non-expert  subjects  are  satisfactory  surrogates  for  expert  subjects.  It  does 
not  follow  from  these  results  t!  it  expertise  is  largely  irrelevant  in 
hypothesis  generation.  It  nay  be  llmt  expertise  in  the  subject  matter  of  the 


tasF  must  also  be  coupled  with  daily  performance  of  the  task  for  the  true 


advantage  of  expertise  to  becc*0  apparent.  In  a ny  event,  these  results  do 
indicate  that  the  surprising  deficiencies  in  hypothesis  generation  per  romance 
which  we  have  observed  previously,  and  replicate  here,  are  not  due  to  lack  of 
subject  natter  expertise. 

Perhaps  the  nost  '.merest  mg  aspect  of  the  unaided  performance  of  both  groups 
of  subjects  is  its  implication  for  practical  decision  making.  Subjects, 
tll&et  ftSGfrt  or  riort-pxpert,  are  not  c eligible  of  gone  rating  an  adequate 
»t-  in  this  task.  Uhile  the  generality  of  this  effect  has  not  been 
conpletely  established,  this  is  cause  for  alarm.  The  percentages  reported 
previously  are  not  arbitrary  score:,  they  reflect  the  probability  that  the 
subject's  list  will  contain  the  true  hypothesis.  In  other  words,  if  a subject 
earns  a score  of  of  50. o%  this  means  that  on  the  average  the  true  hypothesis 
will  not  te  considered  on  about  50/.  of  the  occasions  when  the  subject  generates 
hypotheses.  Ue  wonder  whether  decision  analytic  models  are  robust  enough  to 
tolerate  such  a high  error  rate? 

there  are  several  possible  explanations  for  inadequate  hypothesis  generation. 
(Jne  is  tt.it  the  "majors”  task  is  incredibly  difficult.  There  is  an  element  of 
truth  in  this  argument;  a modern  university  is  in  fact  quite  complex,  and  no 
single  individual  can  be  aware  of  the  layers  of  College  and  Departmental 
requirements,  recommendations,  and  student  preferences.  Ue  believe,  however, 
that  this  is  a characteristic  of  many  real-world  situations  that  are  not 
completely  understood.  In  nodical  diagnosis,  for  example,  a physician's 
knoulege  is  comparable  to  that  of  our  experts  in  some  sense.  Doth  groups  are 
dealing  with  an  imperfectly  understood  enviornment.  Doth  groups  are  capable  of 


dealing  with  routine  problems  where  standard  procedures  exist.  These  routine 
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problems  c.i  not  usually  the  t, ,,ee t of  decision  .talysis;  decision  analysis  is 
usually  employed  when  our  undent  ton  : of  the  problem  is.  imperfect,  and  it  is 

■in  these  very  problems  that  one  uo.  i :l  e.  -.ect  to  find  deficiencies  in  hypothesis 
generation.  Furthermore,  o*  ••  earlier  result:  not  obtained  using  difficult 

problems,  instead  ue  used  problems  which  should  have  been  easy  for  our 
subjects. 

An  alternative  explr  n.it  ■ on  to  task  difficulty  may  actually  account  for  a larger 
percentage  of  the  subject'  deficiencies  in  hypothesis  veneration.  In  a study 
examining  the  process  by  which  hypotheses  are  retrieved  from  memory,  Gettys, 
Fisher,  and  rtehle  (1778)  found  that  retrieval  processes  are  supnsi ngly 
inefficient.  This  result  sugge. ‘s  that  the  memory  search  process  by  which 
subjects  retrieve  hypotheses  misses  rvrny  hypotheses  that  are  in  memo."/,  but 
cannot  be  accessed  from  the  data.  If  this  is  the  rase,  then  purl  of  the 
deficiencies  in  hypothesis  generation  are  due  to  failure  to  retrieve 
information  that  the  decision  maker  possesses.  Ihis  situation  is,  of  course, 
exactly  the  situation  with  which  the  aid  is  designed  to  deal,  and  we  would 
predict  from  this  notion  that  the  aid  should  p,ov>?  effective  m prompting  the 
subject's  nemo, y . 

Aided  Perform  ,.jci- 

f a !'  ■ for  mane  i;  of  subjects  on  aided  prblens  was  also  calculated  using  the  same 
procedure.*  described  previously,  except  that  for  aided  problems  the  final  list 
-hat  the  subject  generated  after  using  the  aid  u,,s  scored,  rlenn  aided  c-  r.ert 
perf  or  nance  was  4C/.3Z  and  non-expert  perform  inre  w„  57%.  (he  difference 


2),  but  both 


> 1 


between  groups  was  not  reliable  (F=l.£9;  df-1 ,23;  p>.2),  but  both  ou  s were 
aided  significantly  by  the  aid.  I!.-  t pc r t showed  a,  improvement  of  13.3%, 
while  the  non-experts  showed  an  improvement  of  13.5%  over  their  unaided 
performance.  The  difference  in  the  improvement  n,  perform  nee  was  reliable 
(F=4.16;  df  = l ,28;  p<  .05).  There  was  jilso  a reliable  effect  due  to  the  n^.-ibe, 
of  data.  Performance  on  one -da tun  problems  was  62. 2X,  while  per formanci.  c,, 


three  data  problems  wan  55%  (i  -23.94;  df-1, 23;  p .01). 

The  aid  does  produce  the  expected  gain  in  perfc  c • that  is  consistent  with 
the  notion  that  decision  makers  can  recegnixe,  but  not  always  retrieve, 
relevant  hypotheses.  It  is  interesting  xo  note  that  the  initial  difference 
between  non-experts  and  experts  is  reduced  oy  the  ad.  ihe  aid  c-r. tianced  the 
performance  of  the  non-experts  to  a greater  extent,  as  wight  L...  expected. 

The  decision  as  to  whether  the  aid  is  worthwhile  to  implement  uill  depend  on 
the  inportance  of  the  gam  in  performance  that  it  produces.  The  conseguences  ov 
the  gain  will  depend  in  a complicate'!  ray  on  the  decision  model  which  is 
appropriate  in  a given  situation,  send  as  we  did  not  embed  our  hypothesis 
generation  problems  in  a decision  situatio  . we  cannot  calculate  a gain  in 
potential  payoff  from  using  the  aid,  nor  can  ue  estimate  the  costs  of 
implementing  the  aid  in  a given  situation,  except  to  say  that  it  should  be 
relatively  inexpensive.  Ihe  aid  does  seen  to  be  promising  enough  to  warrant 
further  development  in  other  situations  to  further  study  its  utility. 

fhe  results  for  number  of  data  were  as  predicted.  Ue  hypothesised  that  when 
hypotheses  must  t<°  retrieved  that  are  consistent  with  several  data  that,  both 
the  memoi  y retrieval  aid  and  tt,p  subjects  would  have  more  difficulty.  .However, 


n 


the  effec  t of  number  of  i a t a int  -r  acted  signi  f ic  an  tly  u 1 th  the  problem  •: F=5  • 44 ; 
df"  i ,28;  p .05)  and  hp  employed  only  fc-m  problem  at  each  level  of  number  of 

■Jot..  These  two  considerations  suggest  that  Una  result  should  he  interpreted 
with  caution,  it  nay  be  an  effect  aue  'to  the  particular  problems  chosen  for  the 
expe. inent. 

Potent! .:«!  perform  jn;:e  of  th.  • aid 

Is  it  possible  that  most  of  the  hypotheses  the  subjects  generated  uere 
anticipated  by  the  aid?  Considerable  insight  into  what  actually  happened  in  the 
experiment  can  be  gamed  by  "turning  the  tables"  on  the  subjects  and  the  aid. 
Suppose  that  all  of  tre  suggestions  of  the  aid  were  adopted  without  assessment, 
and  the  subjects  were  invited  to  "aid"  the  aid.  The  aid  that  the  subjects  would 
provide  m this  situation  would  be  tho^e  hypotlv.-aes  that  they  retrieved  from 
their  oenories  that  had  not  been  generated  by  the  aid.  To  perform  this 
analysis,  it  is  first  neccessary  to  calculate  how  the  aid  would  have  performed 
without  the  nelp  of  the  subjects.  The  result  of  this  calculation  is  that  t.he 
aid  alone  performed  with  a score  of  76a1  The  ab: elute  size  of  this  number  is 
not  the  major  reason  why  it  is  impressive,  if  more,  or  better,  contributors  had 
been  used  . t would  have  been  lag;  ■ ■.  !Tn  fact,  had  the  historical  technique 
that  we  used  to  ascertain  tl  • veridical  probabilities  hi  been  used  to  generate 
the  lists,  the  aid  would  have  performed  perfectly,  achieving  i score  of  83. ?%. 
This  percent  is  just  the  sun  of  the  probabilities  that  are  greater  than  2X . ) 
I he  interesting  result  is  the  relative  comparison  of  the  aid  alone  compared  to 
either  the  aidad,  or  the  unaided  subjects,  and  compared  to  the  subjects  aiding 
the  aid.  This  latter  result  i.  the  simplest,  so  it  will  be  discussed  first. 
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Table  1 


A Conparision  Between  Unaided  And  Aided  Huiiu.i  forfornance 
And  The  Aid  By  Itself 


Hunan  Aid  only  usiyvq  a criterion  of: 


Unaided 

Aided 

1 per  18 

2 per 

IS 

2 per  18 

Percent 

50  .62 

67.52 

76.  02 

67 

.12 

62.92 

St  Hyp>22 

3.04 

4.71 

6.12 

A 

.75 

4.25 

tt  Hyp<27. 

3.75 

7.53 

15.12 

1 

.75 

5.00 
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Uhen  the  subject*;  "aided  the  aid",  the  gain  m performance  uas  less  t ! . -an  1%. 
This  neans  that  the  subjects  rarely  retrieved  hypotheses  that  were  not 
retrieved  by  the  aid,  which  is  a powerful  testament  to  the  tfficency  of  the 
pooling  process. 

The  aid  also  performed  better  than  the  unaided  subject  and  the  aided  subject, 
but  at  the  cost  of  adding  unlikely  hypotheses  to  the  list  of  Majors.  However, 
as  Mentioned  previously,  the  criteria  used  by  the  aid  can  be  adjusted  to  reduce 
the  number  of  "false  alarms".  These  calculations  were  performed  for  the  aid  as 
the  sole  hypothesis  generator  for  various  aid  criteria,  tie  Manipulated  the 
criterion  used  to  include  a Major  on  the  lists  in  the  aid  compute-  Memory, 
using  a criterion  of  either  at  least  1,  2,  or  3 contributors  out  of  the  18 
contributors  as  the  rule  for  inclusion  on  the  aid  lists.  The  results  of  these 
calculations,  and  the  number  of  hypotheses  that  the  aid  "adopted"  that  were 
greater,  or  less  than  21  are  shown  in  table  1 with  the  results  of  ided  and 
non-aided  human  performance. 

(insert  table  1 about  here) 

As  can  be  seen  from  an  inspection  of'  table  1,  the  aid  alone  with  a criterion  of 
2 out  of  18  subjects  is  clearly  superior  to  the  unaided  subject.  It  is  also 
most  interesting  that  it  performs  as  well  as  a aided  human,  achieving  about  the 
sane  percentage  performance  with  roughly  equivalent  false  alarms. 

The  conclusion  is  inescapable.  In  this  situation,  at  least,  the  aid  could 
completely  replace  the  human  decision  maker  with  no  loss  in  performance.  Ue  do 
not  seriously  advocate  such  an  extreme  recommendation  at  this  time  for  reasons 
of  user  acceptance,  but  these  results  suggest  that  the  subjects  would  have 


contributed  little  to  the  performance  or  the  aid  lv  d the  table-;  beer,  reversed. 


The  hype  hesis  generation  aid  was  shown  to  enhavice  the  hypothesis  *je  aeration 
perfornor.ee  of  both  expert  and  non-expert  subjects  to  a noticeable  degree. 
These  results  also  demonstrate  the  potential  of  creating  an  artificial  computer 
Memory  based  on  human  judgment,  which  in  this  situation  at  least,  can  -achieve, 
by  itself,  better  performance  than  am  unaided  human  hypothesis  generator. 
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